Lab 7

There are a total of 20100 events comprising of 100 higgs events and 20000 qcd events. So, we can weigh the data.

https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.hist.html

Now, I will use these weights to plot the mass distributions.

The plot above shows the whole data set with the higgs data shown at ~115.

Now I will calculate the significance.

Now let's see how this approximation compares to the data.

not equivalent, but pretty close. Although in particle physics, this approximation is not good enough. This approximation is a gaussian statistic and gives us a good idea about what the probability value is.

Now, I will make some mass cuts to find the the best significance

Need to find cuts to increase this number

We see that the optimal mass cuts are at 124.55 and 127 so we can use this interval on the rest of the parameters to get the highest significance value possible.

Rest of Features

Set A: First, I will plot the rest of the features to identify the most discriminative parameters and the cut intervals.

Looking at the other parameters, it is definately possible to make cuts to improve the significance. First lets try the transverse momentum

Set B:

It is possible to make cuts on ee2, ee3, d2, angularity, t2, t3, t21, ktDeltaR. First, I will make cuts on t21 as it is the most discrimative.

Next, I will make cuts on t3

Then t2

Now, KtDeltaR

next, angularity

next, d2

This is very good significance. Let's now check the pT plot before and after event selection and compare

We can see that the event selection was successful because much of the background has been eliminated while maintainting a good amount of the higgs data. Now, let's compare the significance for the pT data before/after data selection.

We see that the cuts made improved the significance.

Lab 8

First, we add the high luminosity data to the data done in lab 7 by overlaying the observed data onto the training data.

I will overlap the observed data on the sample data with optimal cuts and compare the significance.

Next, I will do this with the same parameters used in the lab 7 to improve the significance and compare with the significance used observed in lab 7.

In this case, the significance actually decreases...

Significance decreases again.

Observed significance increases again

Low Luminosity Data

95% Confidence Level of signal yields

The confidence interval is given by the expression

$CI = \bar X \pm t*(std)/\sqrt n$

where $\bar X$ is the data mean, t is the confidence level, std is the standard deviation of the data, and n is the data size

https://www.statology.org/confidence-intervals-python/

We see that the highest upper bound is in the expected higgs signal and the lowest upper bound is from the expected QCD signal.